MoEC: Mixture of Expert Clusters
نویسندگان
چکیده
Sparsely Mixture of Experts (MoE) has received great interest due to its promising scaling capability with affordable computational overhead. MoE models convert dense layers into sparse experts, and utilize a gated routing network make experts conditionally activated. However, as the number grows, outrageous parameters suffers from overfitting data allocation. Such problems are especially severe on tasks limited data, thus hindering progress towards improving performance by up. We verify that there exists upper bound up MoE. In this work, we propose Expert Clusters — general approach enable expert learn more diverse appropriate knowledge imposing variance-based constraints stage. Given this, could further cluster-level dropout strategy specifically designed for cluster structure. Our experiments reveal MoEC improve machine translation natural language understanding tasks. plays positive role in mitigating allocation problems, fully releasing potential large-scale models.
منابع مشابه
A Mixture Model for Expert Finding
This paper addresses the issue of identifying persons with expertise knowledge on a given topic. Traditional methods usually estimate the relevance between the query and the support documents of candidate experts using, for example, a language model. However, the language model lacks the ability of identifying semantic knowledge, thus results in some right experts cannot be found due to not occ...
متن کاملMixture of Expert Agents for Handling Imbalanced Data Sets
Many real-world data sets exhibit skewed class distributions in which almost all cases are allotted to a class and far fewer cases to a smaller, usually more interesting class. A classifier induced from an imbalanced data set has, typically, a low error rate for the majority class and an unacceptable error rate for the minority class. This paper firstly provides a systematic study on the variou...
متن کاملOn linear mixture of expert approaches to information retrieval
Knowledge intensive organizations have vast array of information contained in large document repositories. With the advent of E-commerce and corporate intranets/extranets, these repositories are expected to grow at a fast pace. This explosive growth has led to huge, fragmented, and unstructured document collections. Although it has become easier to collect and store information in document coll...
متن کاملSimultaneous Feature and Expert Selection within Mixture of Experts
A useful strategy to deal with complex classification scenarios is the “divide and conquer” approach. The mixture of experts (MOE) technique makes use of this strategy by joinly training a set of classifiers, or experts, that are specialized in different regions of the input space. A global model, or gate function, complements the experts by learning a function that weights their relevance in d...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2023
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v37i11.26617